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; Generalizabili ty 

tf * 

Abstract*. 

When class is the unit of analysis, 'estimates of ^ 
the dependability of class means are frequently required. 
Using classical test theory it is' difficult to treat this 
.problem adequately • In this paper, we consider t^e d^pen- 
dability of class means by applying generalizability theory <f 
to *a split-plot design in which students are nested <^>ithin 
c;la:sses. Using the s-plit-plot design we obtain four distinct 
generali'zabilitvxcoef f icients. We then compare these 
four coefficients with each other and with three previously ' 
reported reliability coef f icients • We find that each 'of 
the three reliability coefficients is related ^to one' or .more 
generalizability coefficients. However, none of the 
reliability coefficients is equivalent to the^ generalizability 
coefficient which is, in our judgment, usually the mdst 
appropriate cq^^^icient for describing the dependability of 
class means . 
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The Generalizability of Class Means 



Introduction 

Recently, a number of researchers have given serious 

consideration .to the problem of estimating reliability when 

the unit of analysis is a cl^ss mean or; some other aggregate 

s'core for a set of persons • Haney (1974a, 1974b) has reviewed 

*^ » » 

important aspects of the relevant literature. 

.The -study of this topic has been motivated by the 
analysis of data from several different sources.. In particular 
large scale evaluations, such as those undertaken for Head 
Start (see Smith & Bissell, 19 70) and Follow Through (see Abt 
Associates, 1974), frequently require estimates of reliability 
when class is the unit of analysis. Similar issues arise 
in the study of course evaluation questionnaires (see Kane, 
Gillmorfe, & Crooks, 19 74)*. 

Using concepts frqm classical reliability theory, 
Shaycroft (1962), Wiley (1970), and Thrash and Porter (1974) 
have developed three different coefficients for estimating the 
reliability of class means. The procedures used to develop 
these three coefficients all assume that an observed score 
is*" the sum of a true score and an undifferentiated error 
term. Ho^e\fer, each of these procedures mak^s different 
specific assumptions about what^constitutes an appropriate 

■ / 
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es^tim^'te of the error vari^ance. As a result; each procedure 

gives a different estimate of the reliability of class means. 

Since the three procedures will not, in geneiiftl, lead jto._ . ^* 

even apprgximately equal estimates of the reliability 

of class 'means, it. is of considerable importance to determine 

the appropriate coefficient for any particular application. 

Within the context of classical reliability theory, 

it is difficult to compare the three procedures, a^d arrive 

at reasonable conclusions about their relative merits. 

Difficulty in comparing the proposed reliability coefficients 

exists because these coefficients .are derived from statistical 

models which are based on different assumptions. In particular, 
> 

the variance attributed to error arises from different sources 
in the three models. However, the 

coefficients can be compared directly within the context 
of a more comprehensive and detailed model. 

Brennan (in'pre^s), Kane et al. (1974), and Haney (1974a)^ 
hav^. suggested that ^ the reliability -of class me^ns be 
approached through gen^ralizability theory, as explicated by 
Cronbach, Gleset, Nanda^r and Rajaratnam (1972) . Generalizabi^ity 
theory extends reliability theory by allowing for a multi- 
dimensional interpretatioyi of error which, in turn, provides 
a much more sys tematic • an^ precise method for studying the 
dependability of class means. - 1 
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In this paper, we consider the dependability of class 
means by applying generalizability theory to a s-plit-plot 
design i^n which students ^re nested within classes*. 
Using the . split-plot design we obtain four distinct • 
generalizability coef f icie-nts • We then compare these four 
generalizability coefficients with eauzrh^other and with the 

three previous^ repprted reliability coefficients.- We find 

♦ 

that each of the thr^e reliability coefficients is related 
to one or more generalizability coefficients, Howev-er, none 
of the reliability coefficients is equivalent to the 
^generalizability coefficient which is, in our judgment, 

usually the most appropriate coefficient fdt describing ,the 
dependability of class means, - 
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Geheralizability Th eory 

for a thorough presentation of generalizability theory, 

see The Dependability of Behavioral Measurements (Cronbach 

et al., 1972) . A briefer, introduction to many of the basic 

ideas is found in Lindquist (,1953) . Here we discuss some 

concepts -from generalizability theory that relate to our 

subsequent treatment of the dependability of class means • 

« ft 



Overview - ^ / 



The purpose qf both generalizability theory and reliability 
theory is to characterize the depend^ility ,of measurements. 
Classical reliability- theory assumes that errors of measurement 
are sampled from an undifferentiated univariate distribution. 
. In order to estimate the proportion of observed score variance 
attributable to error, reliability theory uses correlations and 
one-yay analysis of variance (ANOVA) . By contrast, gener.aliza- 
bility theory recognizes the existence of multiple sources of 
error, and allows for the use' of any ANOVA design in order to 
estimate the magnitude of the, variance components. 

♦ 

Although generalizability theory borrows its statistical 
^jr(6dels and research designs from ANOVA, there are some changes 
in terminology and interpretation. ANOVA. is typically used to 
test the statistical significance of hypotheses; the mean 'squares 

0 
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* 

or ^estimated coit^onents of Parlance are not of primary interest, 
rn generalizability theory the emphasis is reversed. The 
components of variance aod the coefficients of generalizability 
computed from these , variance components are interpreted as 
descriptive statistics; statistical significance plays no 

essential role. Cronbach et; al. (1972, p. 192) actually advise 
against the use of tests of significance for the variance « 
components . * , . 

Terminology , » ' ^ ' ./ 

In generalizability theory, any observation on some unit 

of analysis (e.g.. school , class, or studen^t) is- taken from a , 

\arger set' or universe of observations. Any observation from 
» • 

this universe can be characterized: by the conditions under which 
it is made. The- set'' of all possible conditions of a particular 
kind is called a f acen: . For example, when class is the unit of 
analysis, the conditions of observation are characterized by 
an item facet and a student facet. - This terminology is slightly 

different from that typically used in statistics, where classes, 

* * * ♦ 

items, and students would all be referred to as dimensions . The 
use of the term "facet" in^ generalizability theory serves to 
emphasize the distinction between the unit of amalysis which is 
^being observed and the facets, which indicate the cqnditions 
under which the observations are made. 
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Genegralizability theory also emphasizes the distinction^' 
between G studies, which examine the dependability of some^ 
measurement procedure, and D studies, which provide the data 
for substantive decisions. ^The- purpose of the G fetudy is to 
estimate components of variance, which may then be used to 
estimate generalizabil'ity coefficients for a variety of ^ 
D studie The G study and the D study may be '-^e same study, 
or* they may be different studies using the same design. 
Gei:>eraily, however, G studies are most useful when they employ ^ 
complex designs and large sample sizes to provide stable esti- 
mates of as many variance components as possible. These com- 
ponents^can then be .used to estimate generalizability coeffi- ^ 
cients for various D study designs before any D study is-j. 
implemented.- ' . » 

In the discussion .that follows, it will be useful to 
employ some additional terminology from genera liz$ibility ^eory. 
According to Cronbach et al (1972), 

. Ct)he test Meveloper or other investigator who carries out 
a G study takes certain face'ts into consideration 'and, witl\ 
respect to each facet, considers a certain range of conditions 
The observations encompassed- by the possible combinations 
of conditions' that the G study re^esents is called the 
universe of admissible observations . • We may also speak of 
the universe of admissible conditions of^a certain f^et. 

/ 

' ■ • :^. 10 " ' 
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A decision maker, applying 'essenfcially the^same 
measuring technique, proposes to "generalize to* some 
univefrise of conditions all of which he sees as ^eliciting 
samples of the same information. We refer to that as 
the universe of general izat ion > The G study can serve 
this decision ^aker only if its universe of admissible 
conditions is identical to or includes the proposed 
universe .of generalization. Different decision makers 
may propose different univSirses of generalization, A 
G study that * defines * the universe of admissible obser- 
* vations broadly, encompassing all the likely universes 
6f ge.Tieralizatida, will b^ useful *to various decision 
m'aJ^ers, (p. 20) ^ . ^ 

The decision maker is generally interested in the mean 
of the observations over the universe of generalizatioir, called 
the universe score. Universe scores are not directly observabl 
ahd are usualLy estimated by the mean over some sample of obser 
vations. The dependability of^ these estimates is 'reflected by 
the value of a generalizabi3,ity coefficient. 

. ' Coefficients of generali zabili,ty * are defined as the ratio 
of th-e universe score variance to the expected observed , score 
variance. These coefficients are essentially intraclass 
'correlation coefficients with universe score variance replacing 
he true score variance of classical test theory (Gronbach, 

• c ^ il 
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Ikeda, & Avner, 1964) • The observed score variance has 
essentially the same interpretation in both formulatiohs . 
Background 

Many of the ideas that* underlie generalizability 
theory are ndt new. In fact, Lindquist (1953) and, to a 
lesser extent, Hpyt (1941) ^suggested procedures for calculating' 
reliability that foreshadow the approach of ge^gralizabllity 
theory. Using a ranTiuinized block design and basic principles 
• from reliability and a^nalysis of variance/ both Lin'tiquist , and 
Hoyt calculated intraclass correlation coefficients* (generali- 
zability coefficients) that are algebraically equivalent to 
Kuder and Richardson^' s (19.37) Formula 20 and Cronbacji's. (19 51) 
Coefficient a (see Brennan, in press) . • * * 

Generalizability theory extends the work of. Hoyt and 
Lindquist t-d multi-faceted statistical and measurement models. 
In' subsequent sections of this paper we concentrate upon one 
such model, the split-plot design, as -a basia for considering 
the dependability of class means%^ ' ' h - ' 

The Spld.t - P3.ot Design as a Model for ' r 
the Generalizability of - Class Means 
Split - Plot Design ^ . 

In most 'Situations, when the dependability of/^class means 
is unde'r study*, * the design is that designated by Crbnbach et a-1 
(1972) as design V-B. This design is often referred to as a- 
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>pliJ:-plot .design iiV .standard ex^eirimehtal' design tex.ts <le,g., 
KiyrTc', I968, arid Winer, 1^71),. in this design, students' aire " 
nested within clashes and crossed wii'h items.,.. Tlrus, each'clk^s^' 
contains a different set\of students, jDut the same set of. items 
are administered to the students 'in all "classes • \ . 

' . ' ^ y % ' ^ ^ ^ 

The structural model for this design* is: 

,, ""^si = ^ ^ .^s (c) e-is ic) ^(csi) ' 

whe^je* * * * . ' / \^ 

^ ^^ti = grand mean 

= effect for hJLafe^s c (c^ 1, 2, ... , n^) , 

- = effect for s^^ent s_ (s = 1 , 2 , • • . , n^) * 

' > ' ^ ' ' . 

>|p " ' , • n^ted within class c, 

: B ' = effect for item i (i = l, 2,.,., , r\.) , 
» i_ • — — , — 1 

aB . = class by item interaction, ' ^ . ^ 
, '^^i^(c)''~ item b/ person (nested within class) interaction, and 
' £o(csi) experimental error ('o is a replication subscript^. 

•Following Kirk (1968), the subscript c referring to the nested 
treatment, class, is 'placed within pare^ntheses . To simplify 
out discussion, we will assume in this, paper that the number of' 
students within a class, n^ , is a constant oyer all' classes. 
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Factorial Design 

The nesting of students within classes indicated in 
'Equation 1 results in a Confounding of the effects due to 

classes and student's • The implications of this confounding 
'in the split-plot design ar? evident from a consideration of 

the analogous three-way factorial design in which classes ^ 

students, and items &re all cross.ed: 



' ^osi = M + V;+ aTT^ [ 

' ^ ^^ci^"^^^ ^^^^ ^io(cs^^^^ 

If the factorial design were appropriafe, then every student 
would ^appear in each and every class, and the student effect 
'could be estimated independently of the class effect. 
CofifQundin^ \n the Split - Plot Design 

The differences between the ^nodel equations fon the 
split-plot aild factorial designs af e'*attril?utabLe to the fact 
that in« the factorial de.sign (Equation 2) 'students are crossed 
with classes, and in the split-plot design (Equation^ 1) ■ 
students are ne^ti^ed within classes. This nesting in the split-^ 
plot design^ results in a confounding of at leastr^ two sets of • 
effects represented in Equation 2. 

First, the student main effect is confounded -with" the clclss 
by student interaction;* that is, "^^gj^j the split-^plot design 

' ■ ' : ■ ' 14 
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reprersents the co'mbined cpntrlbutibn of the a£ujdent ^f£ect and 

the student by class interatstionr, tt +. oitt , irv the factorial 

* , ' ' ^ , s ' OS\ 

design. For the spHrt-prlot design, each stud6^ is observed 

in only one class; thus, theire is no way tp- estimate the / 

/ ' ■ / , y . _ ' ^ ' ' 

student effect independent of the class by student interaction. 

Similarly, the student by item interaction .is confounded 
with the class by student by item interaction;";thus,y ^ Br 




in 4:h*e split-plot design takes the place of 'Ptt: '.+ 'ct37T 

' , '\ ^ ' .is \ ^3^1 S 

in the . factorial design. " • . - • y • 

Also, in/ the absence ;af">x6plications , there -Is'/ 
another type of confound^iig *in ,bn>ths designs. When there, is 
only one observatipn ,for a .giv^n student responding to a given 



item, then', , /or Egua^t/i>on l,^the ^ror term, e 



~o (QSi) 



, IS con- 



'founde/d^ith the f tem. by ' studeutT (nested within class) inter- 

actipn,-" Btt . , V . ! In this *case,>67r. ]^ ^dn the sp^lit-plot . 
? i_s (c) ^ _ :ls (eye. ^ ^ ^ 

design" replaces B-tt." aB^n' . / ' ^^''.n 'ii^\the factorial design 

, Terminology • . . *'i ; ' \\ \ \ ' ' \ \ ' ' \ 

The terminology used, in i "the preceding section is, for^the 

' ■ • '• ^ 1 ■ ■ -•■ r'^'r ' ' ^v:' --^ 

mo^'t pa2^t, representative of ,the terminology used m texts on* 

t\'Q^5fV?3r,iip^tal design (e^^. Kirk, 1968,»'and Win'er, 1971). "'Since 

^T'q^Lass ia the uhit df analysers under consideratiipn here, , 
• -'"^ ' r : , ,1 ' " . '.' \ y ^ 

rf^*". . . • . i , 'it ^ • V V 

: -'"Cxc^bach-.e^t tail. (1972) -would sa'y that the spSLit-plot and > 

V.»*£aJc,t\6riaV depi^gns have tWo fiafc^ts . (s'jtudents an^ items). 
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( ' / * ' • • 

• Furthermore, CroJibach 'e;t' al^; X1972) would distinguish • 

between the G study desig^'.used to estimate ^ variance 

components and the D study design u^ed to make decisions • 

In theory/ the G study design and the D ^study design need 

*not be the samfe; however, for purposes; of simplicity, unless - 

" ' ' ' * *^ 

otherwise noted^ we^ asVujne here thai both the G and D studies 

employ a split; -^lot- design. 

Assumptions 

The assumptions for th^^ split-plot design model in 
Equation \ are well documented in the literature and experi- 
mental design texts. However, we wish to emphasize two 6f * 
these assumptions. First, each effect in .Jbhe model is assumed 
to be independent of eyery other effect. Second, in order to 
make the estimates of the ^'ef ^ects unique, the expected value 

i. 

of each' effect over any of its subscripts •is set equal to- zero. 

This second assumption is especially critical to an under- 
standing of subsequent parts of this paper; however, this assump- 
tion, is easily misunderstopd* Consider the effect a in 
Equation 1. Suppose, fdr some study, we take a ^sample of n classes 
'from a population of classes. The second assumption implies 

■k* to 

that the sum of a over the population of N classes is constrained 
to be zero, and the sum of the' estimates f of over the sample 
of n classes is constrained to be zero. Hc^ever, it is not 



necessarily true that the sum of a over the sample of n 
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classes is zero.. Using/'*" to indicate a sample mean, 

^ . ' / . ^ - ' . ^ 

to indicate a population mean, and to indicate. * ' 

^ ' «. , . . 

y ■ - ' — ' ■ 

the estimated value, the second assumption means^ that 

/ ' ^ ' ^ 

a_ equals' ^ero, equals zero, 'but a, does not necessarily 

equal zero. ^ ' ^ 

Generalizability Coefficients from a Split - Plot Design 

By definition, a generalizability coefficient is the ratio 
of the universe score variance- to the 'expected value of the" 
observed score variance. For the split-plot model, with class 
as the unit of analysis, fou^ different generalizability ^ 
coefficients can be obtained^, ftach of these coefficients is 
characterized Ipy a dif f erent. deAnition of universe score- 
and, hence, a different definition of error. However, for 
each of .these coefficients, the- expected observed score 
variance is identical. 



Expected Observed Score Variance 

Because the effects in Equation 1 are assumed to be 
sampled independently^ the expected observed score variance 
is simply the sum of the variances of the separate effects. 
For a^gample of n^ items and n^ students, the expected observed 
score variance for class means is: 
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Q 2 2 ' ^ (Tr,a7r) a (a6) a (3Tr,a3Tr,e) * 
'^^^ (^c--^ "iy^^ ^ "^"^ 1 ' (3) 

*. • 

where a (tt^'tt) is the population* variance of ^ , , in_ 
^ ^ s (c ) « 

2 ' 

Equatxon 1* The notation a (1r,a7r) is used to emphasize that 

the student effect, Tr^^^'and the class by student interaction, 

otTT^g, are confounded in the effect ^gj^j* A similar ijiterpre- 

T 2 *~ ! 

tation can be given to a (B7r,a3^,e) assuming there is only one 

observation for a given student responding to a given item. 

There is no variance compoaent for the item effect because 

the item effect is constant foV 'all classes,, and, therefore,^ 
2 * . 

a '(B) is zero. Note that the ^population variance components 
in Equation 3 are for samples .of , one item and one student."^ 

In subsequent sections jwe develop the universe score 
variance ^and the associated g-enefaMzability coefficient for 
four different universe^ of generaliz'ation: (a) an infinite 
universe, of students and items,' (b)' an infinite universe of 
Students "and a finitfe set of it^ms, Cc) a finite set of 
students Vnd an infinite universe of '.items, and (d) a finite 

4 

set of students and' a finite set of iteps, 

« 

Infinite Universe of Students ^ and Items * 

If the objective of a:^D study is to generalize to an 
infinite universe of^^udents and items, ^hen the universe of 
generalization ^s the completely crossed universe of 
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admissible observations. This analysis treats^ the set of 

\ ^ • 

items used in the D studyXas a sample from an infinite 
universe of items that could have been used to measure the 
general outcolfte that is of interest; and the analysis treats 
t^e students in. each class as A sample from the infinite 
universe of students who^might h^ve been included in the 
clasps. The unive'rse score for each class is the expected 
value of the observed mean over all possible samples of ' 

\ . . ' 

' students and items in the universe of admissible observations. 



and is given by 



V. = M + . (4) 

The corresponding universe score variance is 

\ 

o^Lv) = o^{a) . ' , , • (5) • 

\ 

The universe score can also*- be obtained by^ taking the limit 
of the observed %core variance in Equation 3 as n^ and n^ 
both go to 'infinity. 

\ For generalisation to an infinite universe of students 
and i1;ems/the appropriate .generalizability coefficient is the , 
ratio of universe score variance (Equation 5) to expected 
observed score variance (Equation 3) : 



'l9 
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^P^(S,I) 



a^(a) 



— s 



n. 
—1 



n. n 
— 1— s 



(6) 



The notation is* consistent with Cronbach et-al, (1972) 

^and indicative of the fact t,hat genera|.izability coefficients 

\ 

Can be interpreted as squared correlation or intr^class 
correlation coefficients-. The letters S and I i,n parentheses 
indicate that the universe of generalization is an infinite 
universe of students (S^) and an infinite univetse of items (I_) • 
The brackets in the denominator iden'^lify the components of 

variance that jointly constitute error variance when the 

2 ' 
universe score variance is 0 (a) . Thus, the expected observed 

\ ' , 

score variance can be viey/ed as universe nscore variarice/.plus 

error variance. 

Equation 6 results from the assumption that the universe 

of generalization is, in effect, the completely crossed 

universe of adntissible observations. 

* '"if 

in a particular D study may indicate that the universe of gener 
alization should be restricted to some subset of the completely 
crossed universp^ of admissible observations. That is, the 
Investigator may wish to generalize to some finite subset of 
the possible conditions (or levels) for one- or moire facets in 
the universe of admissible observations. In particular, in 
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some cases, it may not bfe appropriate to generalize beyond the 



specific cond:},tior}S' of some facet (^s) included in the D study. 

. • . ' . . ^' ^ ^ ' ^ ' - ^ 

Eor example, an investigator who is .ri^terested in how \ 

} ' '~ 

well a training program has taught students' to perform a 

particular set of mechanical tasks is not interested in gener- 
alizing to a , broad set of such tasks. The fi|^te set of tasks 
on which obsfervations are taken may constutute the universe of 
generalization for the' task facet. (By ijjon^astv if the hypoth- 
esis /ijnder consideration concerns general mechanical ability, 
the universe of, generalization would be taken as an infinite 
universe of possible tasks -that might, have been observed.) 

Theoretically, then, for the split-plot design, some 
D s.t^udies may require a universe of generarization in which 
the set of items (or tasks) is finite, the set of student^ is ' 
finite, or both are finite. ' . 

Infinite Universe of Students ; Finite Set 6f Items . , 
' o" If generalization is to the finite set of Items included^ 
in the D sttS^y^, then the universe score is' the expected valiie 

of the observed mean score (X ) over that particular^ set of 
' , * c • • ' 

% X ' ' ' . • 

items and over all students. The components that enter into 
the" observed score are not changed by restricting the -universe, 
of generalization, and Equation 1 is* still the appropriate 
model. In taking !the expected value^Trf-SX , / only terms with 
s ^s a sybsc'ript become .zero, and the universer score is. 
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given by 
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(7) 



The item effect ••and the class by iteiji interaction are present 
in Equation 7 because the expected value is' npt taken ove$- all 
items in the universe of admissible observations, and there 
will, in general^ be systematic effects due^'to the finite, set 
of items included in the universe of generalizat^ion. 

The universe score variance corresponding to Equation 7 'is 



n. 
—1 



(8) 



Equation 8 can also be derived from Equation 3* by taRing the 
limit as • approaches infinity. Again, 3^ is a'^constartt for 
all classes, and a (3.) is zero. 

Por generalization to an infinite universe of students (S) . 
and. the finite set of items (I^*) used in the. D study, the geher- 
alizability coefficient is olDtained from Equations 3 and 8: 



n. 
—1 



n. 
—1 



r~2 2 — ^ ^ : 

a (it, air) q, (3'ir,a3ir',e) 

'. + — : -—^ 



(9) 



n 

— s 



n. n 
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- Infinite Universe of Items ; Finite Set of Students 

In educational research and evaluation, it is generally 
inappropriate to restrict the universe of generalization for 
the student facet. .For diagnostic purposes, ^we may be inter- 
ested in the universe score for a single student, but class 
means are seldom used in this way. In program evaluatipn and / 
research, the intention is almost always to generalize to 
some population of present and/or future students. 

Nevertheless-, one can obtain the generalizabiiity 
coefficient for a finite set of students and an infinite 
u,niverse of items. Later we will show that this coefficient 
corresponds to one of the statistics reported in the literature 
for estimating the reliability of class* means. For this 
universe of generalization, the universe score is 

and , the -universe score variance is ^ 

o ^ • a (7r,a7r) * \ 

^ a (v^) = a^(a) + . * ^ (11) 

' - n ' • ' . 

The variance due to the student, effect does not' go to zero 
beaause a different set ojf students is in each class. 



Equation 11 can alsa be derived from Equation^ 3 by taking the 



limit as approaches infinity. 
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• \ ' ^ Generalizability 

For generalization to an infinite universe of items (I) 
and the finite s>et of students (S^*) used dn the. D study, the 
generalizability 'coefficient is obtained from Equations 3 and 



• 9 a (tt ,a7T) 



£p^(S*,I) = 



a (a) + 



a (7T,a7T) 



n 



r"2 5 ' 

a (aB,) a (37T,a37T,e) 

+ : < 



n. 
I- —1 



n. n 
— 1— s 



Finite Set of Students ; Finite Set of Items 

Restricting generalization to a particular set of 
students and a^ particular set of items is even less likely 
to be appropriate than restricting the universe for either 
facet and generalizing over the other. The results are 
presented here because they lead to a coefficient that corres*- 
ponds to a r^eliability coefficient that 'has been proposed for 
class -metans. The universe score for a* fixed* set of students 
"in each class", and a fixed set of items for all classes is-' 



\c. ^ c • (c) 



(13) 



and the uni-verse store variance is 



2 2 2 

2 2 ^ (TT,aTT) a (a3) a (BTT^aBTT) 

a (v ) = a (a) + ; + + 

c 



(14) 



•n 
— s 



n. 
—1 



n. n 
— 1— s 



^he universe score variance is estimable if the effects^ 
^^i6(c) ^(csi) Equation ^ are not confounded;, that is, 
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\ . Generalizabillty 

Equation (14) is estimable if there is more than one rep^i- 

cation of each class-student-item observation. This coeffi- 

cient can also be estimated if it ca-n, be assumed that 
2 - * 

a (3TT,a6V) /n^n^ equals zeriD; in this case^ 'the true score 
variance is given by the first tjiree terms'* in Equation 14, 
and these , variance oomponents are all estimable; 

For generalization to the finite set of litems (I*) and 
the finite ?et of students (S*) in the D-'study, the gerferali- 
zability coefficient is obtained from equations 3 and 14: 

2 2 2 

2 (TT/air) a (a3) a (37r,a3Tr) 

a^Ca) + + 



n n. ■ n.n^ 

— s —1 — 1— s 



= T 2- 2 2 ^ 5 

2 a (Tr,aTr) 0 (aB) 0 (6Tr,a6Tr) ^ 



0 (a) + 



o (er 



^n n. n 
<^-o— 1— s 



where is the ^number of replications of each class-student- 
item observation. 

Classical - Reliability and the Spearman-Brown Correction 

All four of these generalizabillty coefficients have the 

general form of a reliability coefficient if true scoife is 

defined to be equal to the appropriate universe score. -The 

differences among the coefficients are*, then, the differences 

among their definitions of true- score and ^rror score. *\ 
c 2 

For pp (S,I^) , 'Equation 6, the universe score variance is 
the sampling variance of the main. effect due to , classes, a^(a), 
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All other components ^ in the observed -^score variance are 

sources of error • 'In classical test theory, the error variance 

is undifferentiated; increasing the number of observations 

by a factor of M leaves the true score variance unchanged 

and decreases- th'e error variance by 1/M, This regu- 

larity is the basis for the Spearman-Brown formula for changes 

in the lenglh of a test. In Eqtiatio-n 6, the error variance 4ias 

no such simple relationship to, the number of students, items,' 

or^ the ^product of th-e two; consequently, the Spearman-Brown 

* * * ♦ • - 

formula does nol^ , apply. It is, however, 'easy to compute 

fp (S,^) for any numSer of students and iteiqs by substituting 
the appropriate values of n and n^in Equation 6. 

.■^S "^1 

' c 2 " ' 

For (S,2[^*), Equation 9, where interest is restricted 

to the finite set of items in the D study, the class by item" 

interaction i^ a component ^in the true score variance* For 

this coefficieiit increasing the number of stude^nts by a 

factor of 'm will decrea'se the error* viriance by^ 1/M but 

\will*not affect the universe senate variance. Thus, the 

Spearra^n-i Brown formula holds ^r the number of students.. ^ 

However, increasing the number of items by a factor of M • ' 

does not decrease the error variance by 1/M and does 

affect the universe score variance. Thus, the Spearman-Brown 

formula does not hold for' items. - 

Similarly, the Spearman-Wown formula applies to Sp^(S*,I) 

for changes in ^he number of items but not for changed in the 

number of student_s. Finally, the Spearman-Brown formula 

applies to ): p (S*,I*) for changes ig^^e number of replications 
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but does not apply 'for changes 'in the number of students or the 

• * • 

number 6f items because such^ changes' affect the universe 

i 

variance. , -. 



score var 



/k . Estimation of Variance Components 

The process of <^obtaining numerical estimates of " 

. ^ ' ' ' / 

general-izabilitY coefficients usually involves two steps ."^ 

First, t^e components of variance are ' estimated from the G 
stu(3y. Then, the generalizability coefficient is calculated » ' 
us-ing the estimated vari^nge components and the sample sizes- 
from the D .study. ' ' ' . > t- 

General pr'ocedures for the estimation of variance compo- 
nents from computed mean squares are discussed by Cornfield 
and Tukey (1956), Csfokbach et al. (1972),, Millman and Glass 
(1967) , and by most .standard textbooks on experimental design' 
^^e.g:, Kirk, pp. 208-212, and Winer," pp* 321-332). 
^ ' In ""the next two sections we treat the estimation of 
variance components .wheji both the G'and"^ D studies use split- 
plot 'd'esigi?sv '"Subsequently , we briefly consider the estimation 
of variance 'components when the G study is a factorial design 

and the D study is- a split-plot design. ^ ^ . ^ 

y 

In order to .estimate the variance components, we must 
specify whether the model assumes random, mixed, or fixed 
•effects. The choice among a random^ mixed, or fixed effects 
model is closely , related to the choice of a universe of 
generalization. To tre^t a 'facet as a random effect is to say 
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that the observed conditions of the facet are sampled trom 
^^nfinite universe of similar conditions. To treat a facet 
as a fixed^ effect is to say that ^ the observed conditions of 
tj'he facet constitute the. universe of conditior^s of t^e facet. ' 
In subsequent sections we discuss the choice among random^ 
mixed, and fixed effects models in the G study, and the impli- 
cations of this choice for later D studies. • 
Random Ef feqts Split-Plot Design ' i 

niie four generalizability coefficients have all been , 
developed in term^ of components of ,variai>ce 'for a random ^"T*^ 

effects analysis of variance. It was assumed that the ^las'S'es./ 

and the conditions of the two facets were sample^ from infiH>tfe 
universes of possible classes and conditions. The tormula-^^'^" 
for the expected values of the mean squares, based on a random . . 
m'ode]^^are presented in T^ble if where it is assumed that all , 
classes have the same number of students.* In Table 1, primes ' \. . 
are 'used with sample sizes in order to distinguish G study sample* 
sizes fram subsequent D study sample sizes. Table 1 'alsC provides 
estimaftes of the ^variance component? in ^erms of mean squares, 
from the random effects model. - , ' - , 



Insert Table 1 about here 



Usiitg the estimated components of variance in Table 1 and 
the sample sizes from the D study, we can estimate each of the 
four generalizability coefficients discussed previously. That 
is, the components of variance frorQ the random effects model 
can be used to estimate generalizability coefficients for random, 

28 ■ ■ ■ 
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niixed,/or fixed effect?. Generality f ."this is the most effi- 
cient and usefufway to calculate these poef f icients • However, 
if l?pth the G and D study imply the -same universe of generali- 
^z-a'tion (e.g., both have items. fixed arid students random), then 
"^one can redefine"the s'tructural model and^calculate the 
appropriate generalizability coefficiejit majre directly. 
Mixed Effects and Fixed Effects Split-Plot D'-esigns- 

^ ' / The analysis of variance ilT^able 1 treats^'lkll effects 
• y /' ^ ^ ' . 

as rand^om effects. '^The data from the G study ' i^^Ji also be 
analyzed using a mixed model ^in which on^ o-f t^^fac.ets is 
treated as a 'fixed effect and the other is treated -as a;r^ndpm"^ 
effept. ,In treating a fqicet as a fixed effect; the investir ' 
gator is deciding that his interest is in the observations 
collected under the finite set of conditions of the fixed 
facet and in no other, possible conditions of that facet. For 
the split-plot design enjploying mixed effects, either item^s or 
student^^,ill be fixed but not both. 

If the' I'tem" faeet^ is fixed and the student facet is random, 
then the finite set of' t^ems under consideration constitute the 
universe of generalization for. .the item facet. Thus, when we 
take the mean over items in Equation 1, the item main effect 
and air other Effects involving items are zero. > The resulting 
stri^ctural mbdel for the observed mean score for a class is t 



^ r 
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Using Ec^uati^on 16, the expected observed score var'iance is 

^^I*(^p.-) = + aj*(7r,a7r)/ng; (17) 

the universe score Vciriance is 



and the generalizability coefficient is 



r • 



2 / X 

^ P^(S,I*) = ■— 



t 

2 2 



(19) 



where the subscript !_* is used to indicate that tSiese 
components of variance are estimated from a mixed model with 
the item facet fixed. Equations 9 and 19 have the same inter- 
pretationj and, as shown below, they are algebraically identical. 



Insert Table 2 about here 



Table 2 . lists the expected values of the mean squatees for ' 
^ . \ 

a mixed model ANOVA with the item effect fixed and all ot)ier 

effects random. ^ Table 2 also provides the estimated values of 

the variance CQmponents .i^ tetrms of mean squares. In comparing 

the mixed model in Table'^2 wii:h the random model in Tab4X<i;,%!^e 

note that the mean' squares in both tables are identical for all 

sources. Furthermore, 

. . ...^ ... -. .. _ • . ^ - . , 

2 MsJ(C) - MS(S) 



I 

0 



—1 s 
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MS(C) - MS(S) - MS (CI) + MSCR) MS(CI) - MS(R) 
: + . 



rf^n" nin' 
— 1;-S ~ir"£ 

e^(a) +,8^/ae)/n| . • (20) 

Similarly, it is* straightforward to show that 

» 

2 2 2 ' 

8j*(7T^a7T) f 8 (ir^aTT) + 8 (Bit ^agir ,e) /n| • (21-) 

The algebraic equivalence of .Equations 9 and 19 'is now 
immediately evident. 

Also, if the number of students and the niimber^of items 
^re the saipe in th^. G and D study (i.e., n! = n. and n' = n x 

it is easy to show that, for both the random and mixed models, 
the estimated value of the coefficient is given by 

2 MS(C) - MS(S) ^ ' ' * 

tp^(S,I*) = . (22) 

MS (C) V 

The random effects ANOVA outlined in Table 1 attributes 
the mean square for classes to four sources, 'two of which 
involve the sampling variance due to item e'ffects. In the mix 



ed 



{tiodel ANOVA outlined in Tabled, the mean square for classes 
is attributed to two effects, and the interaction effects 
involving items do not appear. In the mixed model, there can 
be no sampling variance for item effects -because tfie mean scores 
are not based on a sample of items but on the universe- of items. 
The variance that was attributed to item, effects in the random 

; 31 
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model must now be attributed to the sampling of students or 
to differences between the universe spores for classes. 

A comparison of Table 1 with Table 2 shows that the 

2 ' ' 2 
mean scjuar^s used to-estimate o {^-n ,0.^11 ,e) and 0 (Tr^air) in 

the random me^del are used to estimate a^(7r,a7r) in the mixed 

model. .Similarly, the mean squares used to estimate a (a6) 

and o (a) in the random model are used to estimate o^^(a) in 

•the mixed model. The estimates of the class e'ffect aild the 

student affect are larger for the mixed model than they are 

for the random model. ' 

- This redistribution does not affect the expected 

observed s^ore variance, but it does change our estimate of 

the universe scqre variance. That part of the mean square 

for ^ classes that is assumed, to be dtie to sampling of items 

* 

in the random model is assumed to be due to differences between 
class universe scores in the mixed model. 

The mixed model has -led to a somewhat simpler expression 
for * p (S,I*), and it will yield the'same,.,yalue''for the 
estimate of this coefficient. Within- the assumptions of the 

■ ' ' ■ i ' ' 

mixed model it is not poUsible to estimate the generalizability 
c 2 ^ 

coefficient k p (£ri) / which assumes generalization over both 
facets.' For this reason*, the mixed model is not rfecoramended 



fot the analysis of the G 6tudy.,,data^^ Thp mixed model has ^Been 
introduced mainly, to providej additional insight into the nature 
of the differences between the genezralizability coefficients 
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introduced earlier. , , ^ 

Similar analysesTrran be carried out for the fixed 
effects^ model and the second mixed mociel in which the^student 
facet is fixed and the item facet ^is random. For both of these 
models' the generalizabili ty coefficient is equal to the 
variance attributed to the class effect for the particular model 
divided by the estimated observed score variance-. The numerical 
estimates of these coefficients w(ill be identical to those 
previously obtained using componeA^s of variance from the 
random model . 

It is generally best to estimate ,and report components 
of variance for the random model, ""if the components of the - 
random -model are known, any of the four generalizabilxty 
coefficients can be estimated; but the components from a model 
with a fixed facet cannot be used to estimate a generalizability 
^ coefficient that assumes g'eneralization over that facet. 
Random Effects Fac to^ial Design . , 

If the G study is a factorial-design and the D study 

IS a split--plot design, then to calcul-ate generalizability 

I 

. coefficients for the , p. -study . we estimate the appropriate 

variance components from the G study factorial design 
. (aee 'Equation 2). Since the effects are independently ' sampled, 

a -(7T^,a7r) =. a (ir) + a (air^ in terms of variance components from 

' • ■ ' -2 2 ' 

the fact,02fial design. Similarly, a (3'n" ,a3'n',e) = a (Btt) + * 

2L ' ' 

a (a3Tr,e) -if, there is only one. observa*tion per class-student- ' 
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item combination in the G study; or (67r,a07T,e) = a^(67r) + 
2 ... 2 

0 (agTr) + a (e) if there are replicated observations. Since 
2 2 

0 (a) and a (a6) are unconfounded in both the factorial and 
split-plot design, these variance components have the same ' 
interpretd'tion in both designs. 

Given the above relationships, we can estimate each 
of the generalizability coefficients for the split-plot 
design, with the possible exception of • ^ (S* , I*) , Equation 
15.' This is the appropriate coefficient when the item and 
student facet are both fixed. It can be estimated only if: 
(a) there is more than one observation ^for each class-student- 
item combination or (b) we assume that both a^CBir) and a^(a6Tr) 
are equal to zero. 

Generalizability Coefficients as 
Expected Values of Correlations 
Each of the four generalizability coefficients can also 
e interpreted ^ as/ the expected value of a^ correlation - 

between pairs of measurements" on a sample of classes. In order / 

I ■ - 

to examine the dependability of class means as correlation 

coe|f f icients , it Is necessary to qbtain two measurements on 

i 

each class*. The appropriate procedure for obtaining these two 
mea.'surements depends on the definitioiji of the universe of 
geileralization. 

Qeneralization Qver Students and Items 

If both students, and items are sampled from infinit^ 

34 
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universes, each measurement involves a sample of students 

and a sample of items. The sampling of students and -items 

for a second set of measurements is independent of the* first* 

Using the random model each measurement has t:he form 



X =3 
•w^, c • • 



For any pair of measurements, X and X' , e^oich measurement 

» ; C** C**. • 

has the same expected observed score variance, given^J^y 
Equation 3. . * • " 

The expected value of the covariance of X with X.' 

• c* • 

2 , • — 

is a (a). The other effects ar,e sampled -independently for the 

two sets of measurements, and, therefore, the expected values 

of all other terms in the covariance are zero, • ^ .\ 

The expected value of the correlation between the two 

mean scores is approximately equal to the expected valtie of 

the covariance divi^ded by the expected value of the variance 

(Iiord & Novick/ 1968, pp, 201-203): ' ■ - , 

a (a) 

^t^(x X. j3 ^ : ^ (2.4) 

ii ii 2 2 2 

2 a (Tr,aTr) a (a6) q (Bit ,a6'Jr/e) 
a {p.) + + ^ + ^ , , 

n . n. ' n . n 

— s " —1 —1— s ■ 



which is' identical to ^p^(£,I^). 

Thus, the generalizability coefficient, generalizing over 
both students and items, is approximately equal to the ^ 
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expected correlation bet^cfen two <«ts of ITieaflUrcifnents taken 
on a sample of classes, where, the two sets of measurements are ^ 
based on independent samples of both students^and items 
A correlation of this kind can be obtained on a set of ^classes 
l?y taking, the mqan oi^alf of the studeats and half of the 
items as one measurement, and the mean on the remaining students 
and items as the ^o/her measurement. Unfortunately, it is not 
possible to apply the Spearman-Brown formula in this case; * 
consequently, it is generally necessary to use the ANOVA * 
procedures outlined earlier. 

Generalization over Students Only ' v - ^ > 
\ ^ ^ ^ 

It generalization is to an infinite* universe of students ' 
and to the finite -set of items used in some D ^tudy', an " ' 
appropriate pair of measurements would use the same items but 
independent samples of students. An estimate of the correlation 

coef f^icient could be obtained by taking, a random split on each 

« ' '4 

class and correlating the mean scores over the two halves of ' 
each class and over all items. In this case the Sp^armah- 
Brown formula does apply and can be used to estimate the 
cprrelation for full classes-. J " ^ 

The derivation of the expected value of the correlation, r 
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corrected for class size, over all possible splits bji classes 

is found using the same procedure employed in the/ ^.revious case. ^ 

t \ " * ' 

„The expected observed score variance for the two.^mea^sures is 
again given by Equation 3. The expected^ value ,o£ the\covarian,ce , 
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however, has one additional term. Becaiise the same sample of ^ 
items is used for both measurements-, - ^ 



covlaB^^,a6^.) = a^(a6)/n^ , 

s 

and the expected value of the co'\/ariance is 

a'^ia) + a^(a6)yn^ . 

The ratio of the expected covariance to the expected observed 

2 

score variance is $p (S,I*) • This coefficient, generalizing 
over students but not over itenas, is approximately .equal t,o^ 
the expected value of a^ split-class estimate of reliability 
that is corrected for class size using the Spearman-Brown 
formula • ' • « . ^ 

Other Universes of Ge ner a li z a t i on ^ 

Similarly, it can be shown that the coefficient (S*,I), 
for generalization over items but not over students, is approxi- 
mately equal to the expected value of the split-halves relia- 
bility corrected for tes-t length. Also, the generalizability 
' . e 2 

coefficient tp (S*,I_*), generalizing only over random error, . 

* - 

is approximately equal to the correlation between two independent 
measurements of the class mean using the- same items and the 



Same students for both measurements. 

1 ■ 
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Previously Reported Reliability Coefficients 
The earlier sections of this paper have presented a ^ 
discussion of four genera^izability coefficients for estimating 
the dependability of class means. In this section, three 
coefficients that have been proposed for estimating the 
reliability of class means will be presented and related to« the 
generalizability coefficients discussed earlier^. 

In a discussion of the statistical properties of school 
.means, Shaycrof t ' (1962 ) proposed the following coefficient for 
estimating the reliability of class means: 

2 

\ ^AA = ^ - ^AA?' • ■ - (25) 

%^A 

where . , * * ' . 

/ " ^Aa'^ reliability of class means, * 
r^ = reliabi}.ity of studeht scores, 
n^ = number of students per class, . . \ ' ^ 

E standard deviation of "class means, and 
' E standard deviation of student scores. ^ 

The translation of this formula into the nptation used in this 
paper is straightforward. ^ \ ' 

" From previous results, 4 definition the .6Kpect:ed 
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■observed score variance, ^0- (X ), in .Equation 3. 

' c • • * 

Brennan (^n press) provides formulas for ot and r_^ in terms-' 
of component:s of variance from the split-plot model: 



^ ^^(TT,a7T) + Ka^(a) + CKa^(aB) + (37T,aB7T;, e) J/n. (26) 



and 



o (TT,a7T0 +'Ka (a) 



"AA 



(27) 



where 



n (n - 1) 
-s -c 



K = 



n n - 1 
— s— c 



(28) 



Shaycroft's formula assumes that .the G study and the D study* 

use the same data; therefore, there is no need to use primes 

to distinguish G study and D study sample sizes, 

2 ■ 2 

Substituting for r^, a^, and in^ Equation. 25 gives 
Shaycroft's formula in terms of components of variance for the 
random effects split-plot model: 



2 o (it, air) 

a (a) + + L 

n 

— s 



a^(aB)' 



"AA 



2*2 2 

2 o (7T,a7T) a (a3) a (gir ,a3ji',e) 

a (a) + ' + + 



n. 
—1 



n^n. 



(29) 
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- L = 



— s— c 
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The coefficient \L -in Equation 29 depends upon thei numbe^r of 

classes and the number of stud^nts per class u^ed to estimate 

' 2 • * 

r^ and a^. The coefficients K and L arise/because the 

student effect is confounded with the class' by student inter- 
action in the split-plot design; thus, these coefficients 
reflect complexities in calculating the appropriate number 
of degrees of freedom when the sampling of students is 
stratified by class , rather * than completely random, ^ ^ ^ 

\ Since n^ and n^ are both greater than one in the' split- 
plot design, L is between zero, and one; and, therefore, 
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Thus^ Shaycroft's coefficient ^(Etjua^ion 25). overestimates 

Pp (S,I*) and undersetimates (S*,I*).' 

— — ■^'w 

In most paractical situations, n^ (and possibly n^) is 
likely to be fairly large; and, consequently, the coefficient 
L will be. close, to unity. Assuming that 0^(671,0611) is close 
to zero, it folTows that r^ will be approximately equal to 
fp^(S*,I*). j ' . 

Wil^y (1970) has proposed an intracl^^ss correla'tion 
coefficient for estimating the reliability of class mean^. 
In his analysis, the estimated universe score variance is 
a (ar'-+ io' (aBH/n^, and his coefficient , is* equivalent to 
Equation .? 2,. J^P_^ (S,^*) in the special>fcase where the and 
D st;udies are identical*. 

^ ; Thrash and Porter (1974) have 'discussed two procedures 
for estimating- the reliability of class^means . The first of - 
these, procedures is to split each class into two random halves, 
calculate the correlation between the mean scores for the half- 
classes, and then use the' Spearman-Brown formula to obtain the 
coefficient for full classes. It has already b^en shown that . 
the expected value of coefficients calculated in this way, 

over all possible splits on clas6.es, is giyen by Equation 22, 

e 2 ' • : * --^ • 

(S/I^*)/ which is equivalent to Wiljey ' s .coefficient • 

The secjpnd procedure discussed by ThJfash and Porter is to 

randomly split the test into two halves,, correlate the half- test 
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means for full? classes, and then use the^ SpearmanrBrown 
formula to obtain* the coefficient for the full-length test. 
The expected value of this coefficient, over all random splits 
^ , on the test, is (S*,!^). This procedure is implicitly 

generalizing ovter' items but not over students. Because 
Thrash and Porter recommend the spl»it-test procedure over the 
split-class procedure , ^we will rdfer to the split- test coeTfi- 
cient as Thrash and Porter's coefficient. 

Of the four generalizability coefficients discussed 
earlier, three are directly related to coefficients that have 
been, proposed '^f or estimating the reliability" of^ class means. ~' ^ 
The authors are not aware of any analysis of the cjependability 
of class means that uses traditional reliability theory and 
develops a -reliability estimate equivalent to ^p^(S,I). 

The 'omission of Sp (S,p is not' diie to chance. " ^ 
^ Traditional reliability theory incorporates a univariate inter- - 
pretation of error. The assumptions ma,de^ab6ut error variance 
^ differ somewhat, but the errors are alwa|^?*. Assumed v to .be drawn 
from some univariate distribution. SinceV->^p'?v(s,I* j\) ^p^(S*,I), 
and Bp (S*,!^*) all arise in the context *oX*'inO^'els wheire error 




^is univariate, these coefficients are perf edt!l^'x?pinE>atible with 
the framework of ^classical reliability theofry^jvi^pt £ pv^J^/ 1^) • 
hpwever, the appropriate model involves two dis^i^^.dt c^poVents ^ 
V, ^ of error whose separate contributions cannot be cqtii^^ned in^o 
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a singl^^ tinivariate/error term. Therefore, the appropriate 
model fot '^p. (S, I_) does not arise' naturally within classical 
'reliability theory. -/ • * . . ' 

Choice of Coef f icieBt when 
Class is the Unit of Analysis 
The chol^cfe of ati appropriate generalizability coefficient 
, for a, particular study depends upon the universe of generaliza- 
. tion that*-is intended. 

>,When. 'Class is the unit of analysis, it is xiifficult to 
conceive of situations in which the interpretation of the 
results of a research 6r evaluation .study applies only to the 
''Vbudents involved in the study^. If the results of studies 
involving new curricula, teaching techniques, human learning, 
etc. are to have more than anecdotafl interest, they must 
be general izable to some universe df students beyond thoge . 
who actually experienced the treatment under study. The 
intention to' generalize to- some larger universe of students^is 
quite explicit whenever variation among students is used tp 
estimate sampling error. ^ 

Also, it is usually inappropriate'' to restrict generaliza- 
tion pver items to the particular finite set o^items used in 
^ome study. However^ in educat^-onal research and evaluation, 
it does sometimes happen that the set of items in the study 
exhaust the universe of behaviors that are of interest. In 
such cases, it is not appropriate to generalise to a wider 
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universe of items. For example, if a dental h*ygiene program 
is intended to train phildofen in the use of a few basic 
skills, then the^items used .to measure the effectiveness of 
the program might exhaust the universe of interest. 

The above observations impl^^ that, for describing the 
dependability of class means, ^p^(S,I) is usually the most 
appropriatis^ of the four generalizability coefficients 

discussed in this paper. J? p CS,I*) appears to be aE)propriate 

J • 

in some cases; but 5p^(S*,I) and £p^(S*,I*) are seldom, if 
ever, appropriate. From this rationale we conclude that: ' V 
(a) Wiley's coefficient, which is equivalent to £p^(S,I*), 
is appropriate in some cases; (b) Shaycroft's coefficient, 
which is an upper bound for ^p'^XS,!*) and a lower bound for 
jl; p^ («*, I^*) , *is perhaps appropriate in some cases; and' 
(c) Thrash and Porter's coefficient is not likely to be 
appropriate unless one can tnake a strong argument for restricting 
generalization over the student facet. 

In summary, clearly there*-ts no universally^.best coef fir 
cie^t; the most appropriate coefficient can be identified only 
in the' context of a particular study. However, we believe 
1;]flat ^p (S,]C_) is, in most ♦cases, the most appropriate coefficient 
to. use. We also note that, froift examination of Equations .6, 9, 
12, and 15-, the following relationships hold: 
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^p^{S,I) <= ^p^(S,I*) ■<= ^p2(S*,I*), (32) 

^p^(S,I) <= i;p2(S*,I) ■<= £p2(S*,I*), . (33) 

##2 9 * ' ^ 

and tp (S,I*) is -greater than ^'p (S*,I) if 



2 

Hj^O (7T,a7T) 



> 1. 



(34) 



S Ultima ry and Conclusions 

Using generalizability theory in the conte'xt of a split- 
plot design we have developed .and discussed four generaliza- 
^bility coefficients for describing the dependability of class 
.means • We have shown that these coefficients can be 
•obtained in three ways: .(a> using, variance components -from a 
random effects analysis of variance; (b) ufftllg variance compo- 
' nents from a mixed or fixed effects analysis of variance; and 
(c) calculating the expegt^d value of particular correlation" \ 
coefficients. These four generalizabil^ity ^coefficients have ^ 
been compared tg three previously reported statistics for 
estimating the reliability: of class means. ConfusiorT^nds to 
arise because these reliability coefficients are characterised 
. by different definitions of error. Furthermore, none of thesb 

three reliability coefficients is equivalent to the generaliza- 

M 2 

bility coefficient ^p ,(S, , which is, in our judgment, ^ 
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generally the mbst approprxate- coefficient/ 

"J.?-: It is understandable that .^p. (S,I) has not been given 
''-iRuch attention as a coefficient for describing the dependa- 
bility of class means The three previous iV i^epontec^ 
reliability coefficients .were developed using a univariate 
conception of error consistent with classical reliability 
theory. Pp (SjJ), however, depends upon a multivariate 
conception oferror,- 'which is not easily accommodated^ in 
classical reliabili^Ktheory , but arises, naturally in 
generalizability theory. ^ v^-, . 

- The jgeneralizability coefficients developed her^ are 

descriptive statistics and do not depend upon .any parametric 

"■^ ^ - ■ 

as^sumptions >bout feKe' (Jistributioii of errors. Such parametric 

, ' ^ ' '\ . ^ ' ' ^ 

assumptions need ^o be made if one wants' to establish confi- 
dence intervals or perform fftatig-tical tests of si-gmflcance ; 
However^ the advisability of performing such, tests of signi- 
ficance is questionable. Even if an estimated variance -compor 
nent does riot possess statistical significance , it is an 
unbiased estimate. As^such, it is better to use it than to 
replace it by zeto ,(Cronbach et al*, 1972, pp. 192-193). 

In this paper we have considered class, as the unit of 
analysis in a split-plot design. That is, we l^ave used the 
word "class" to indicate an aggregate, unit of analysis with one 
level of nesting. The extension to multiple levels of nesting 
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is a relatively; ^^traightf brward application of tKe procedures 
discussed here (see Crpnbach et al., 1972), 

^ We have assumed throughout this paper that classes are 
a random effect. In our judgment, this 'assumption is 
generally valicl. Nevertheless, the formulas for 1;he four-/'*' 
generalizabilfty_ coefficients from the split-plot design 
are unchanged. if we assume that class'^s/are 'a fixed!' effect. 

Also, in order to simplify, -the discUsslon, we have 
assumed an orthogonal split-plot design in which the number 
of students within elas^, n^, i§ constajit over all classes,,' 
Procedures for doing an analy^is'^^f variance for. a split- 
plot design with unequal n's ar^-^vailable in most standard 
experimental design text^*' Telg. / Kirk, - 1968 , pp, 276-282;^ 
Winer, 1971, pp, ^599-603) ^ ' 

Finally , .ve, :note the following reponunendation from the 
most recent edition of Standards for ; Educational & Psycho- 
logical Tests TaPA, 1974): the "estimation of clearly labeled 
componea-ts of score^y^ariance is the most informati,ve outcome of 
a reliai>ility study, both for the test develbper wishing 
to^mE)i^y> the reliability of his instrument and for the user 
desiring to int^rpfet te^t scotes with maximum understanding" 
(p, 49),.' This is equally true whether the unit of analysis is 
a person <!jr an-^^ggregate .of persons, such as a class* If 
components of variance from a random effects G study are 
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reported, then a number of generalizability (qr reliability) 

coefficients are easily estimated, and a single generaliza- 

bility study can replace a nunUDer of separate -reliability 

studies. ^ _ 

« 
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Footnotes 



Since the bbserved score (X ) is £he mean over n, 

items and n^ students, the contributions of the various* 

*^ 

effects to the observed score variance (Equation 3) are 
reduced in accordance with the Central Limit Theorem, 
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f , Table 1 

Split-Plot Design Analysis of Variance 
All Effects Random 



Source 



Classes (G) 



Students (S) 



Items (I) 



df 



Classes x Items {CI) 
* f 

Residual (R) 



— c 



.- 1 



Expected mean squar-e (MS) 



o (BTr,aBTr,e^) + n|o^(Tr,aTr) 
. + n'd^(aB) + n!n'o^(a) 



o (BTr,aBTr,ej + n|o^ (V ,(?Tr)' 
o (BTr,aBTr,-e) + n^o^(aB) 

(n^ - '^HrI - 1) o^(BTr,aBTr,e) + n'o^(aB) 



n^(n: - 1) (n^ - 1) (Btt ,aBTr , e) 



8 (BTr,aBTr,e) = MS (R) 



8 (a&) = [MS (CI.) - MS(R)]/n^ 
8^(Tr,aTr,).^= CmS(S) - MS,(R)]/n! 

= 1^1S(C) - MS(S) r MS(CJ) + MS(R)]/n'n! 
8^(3) = CMS(I) -'mS (CI") ]/n'n" 

"~^J*~S 




. / 
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Table 2 

Split-Plot Design Analysis of Variance 
Classes and Students Random; Items Fixed 



Source 



df 



Expected mean square^ (M5) 



Classes (C) 
Students (S) 
Items (I) 



n' (n' - 1) 



nl 'ir 1 



I 2 , 
n^Oj.^ (Tr,aTr) 

• 0 



( 



Classes x Items (CI) (n;^ -^1) (n| - 1).. 



(bTr,abTr,e) +" n^a (cib)^' 

2 9 

(b7r,ab7r,e) + n^a (ab) 



Residual (R) 



i^(£i - (Hs.^ 1) a|^(b7r,ab7r,e)' 



(b7T,ab7T,e) = MS (R) * 

2 

8j^(ab) = [MS (CI) - MSCR) ]/n^ 
(tt ,a7r) = MS<S)/n! 

/ 

8j*(a). = [^1S(C) - MS(S)]/n'n: ° 

2" ♦ 

,8j*(b) = [MS (I) - MS(CI)]/n^n| 



Greek letters and e, indicate random effect^; unitalicized 
Latin letters' indicate fixed effects. ' 
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